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Abstract 

Code decompositions (a.k.a code nestings) are used to design good binary polar code kernels. The 
proposed kernels are in general non-linear and show a better rate of polarization under successive 
cancelation decoding, than the ones suggested by Korada et al., for the same kernel dimensions. In 
particular, kernels of sizes 14, 15 and 16 are constructed and shown to provide polarization rates 
better than any binary kernel of such sizes. 

1 Introduction 

Polar codes were introduced by Arikan [T] and provided a scheme for achieving the symmetric capacity 
of binary memoryless channels (B-MC) with polynomial encoding and decoding complexity. Arikan used 
a simple construction based on the following linear kernel 



G2 = 



1 
1 1 



In this scheme, a 2" x 2" matrix, is generated by performing the Kronecker power on G2. An input 

vector u of length iV = 2" is transformed to an N length vector x by multiplying a certain permutation 
of the vector u by G® " . The vector x is transmitted through N independent copies of the memoryless 
channel, W. This results in new TV (dependent) channels between the individual components of u and the 
outputs of the channels. Arikan showed that these channels exhibit the phenomenon of polarization under 
successive cancelation decoding. This means that as n grows there is a proportion of I{W) (the symmetric 
channel capacity) of the channels that become clean channels (i.e. having the capacity approaching 1) and 
the rest of the channels become completely noisy (i.e. with the capacity approaching 0). An important 
question is how fast the polarization occurs in terms of the codes' length N. In [2], the rate of polarization 

was analyzed for the 2x2 kernel, and it was proven that the rate is O More specifically the 



liminf Pr ( Z„ < 2"^^) ^ I{W) for ^ < 0.5 (1) 

n— J-oo \ / 

liminf Pr (z„ > 2"^^) = 1 for 13 > 0.5, (2) 



authors showed that 



where {■^n}„>o is the Bhattacharyya random sequence corresponding to Arikan's random tree process [T]. 

In [3], Korada et al. studied the use of alternatives to G2 for the symmetric B-MC. They gave suffi- 
cient conditions for polarization when linear binary kernels are used over the symmetric B-MC channels. 
Furthermore, the notion of the rate of polarization was generalized for polar codes based on linear codes 
having generating matrix G of dimensions £ x £. The rate of polarization was quantified by the exponent 
of the kernel E{G), which plays the general role of the threshold (equal 0.5) appearing in ([Ij and 
(note that here N = ^"). Korada et al. showed that E{G) < 0.5 for all binary linear kernels of dimension 
£ < 15, which is the kernel exponent found for Arikan's 2x2 kernel, and that for £ — 16 there exists a 
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code generator matrix G in which E{G) = 0.51828, and this is the maximum exponent achievable by a 
binary Unear kernel up to this dimension. Furthermore, for optimal linear kernels, the exponent E(G) 
approaches 1 as £ oo. 

In [4], Mori and Tanaka considered the general case of a mapping g{-), which is not necessarily linear 
and binary, as a basis for channel polarization constructions. They gave sufRcient conditions for polar- 
ization and generalized the exponent for these cases. In [5], they considered non-binary, however linear, 
kernels based on Reed-Solomon codes and Algebraic Geometry codes and showed that their exponents are 
by far better than the exponents of the known binary kernels. This is true even for such a small kernel 
dimension a,s £ — A and the alphabet size g = 4, in which E (G) = 0.573120. 

In this paper, we propose designing good binary kernels (in the sense of large exponent), by using 
code decompositions (a.k.a code nestings). The kernels we suggest show better exponents than the ones 
considered in 3 . Moreover, we describe binary non-linear kernels of sizes 14, 15 and 16 providing a 
superior polarization exponent than any binary linear kernel. 

The paper is organized as follows. In Section [2l we describe building kernels that are related to 
decompositions of codes into sub-codes. Furthermore, by using results from [4], we observe that the 
exponent of these kernels is a function of the partial minimum distances between the sub-codes. We then 
develop in Section [3] an upper-bound on the exponent of dimension £. In Section [H we give examples of 
known code decompositions which result in binary kernels that achieve the upper-bounds from Section [31 

2 Preliminaries 

We consider kernels that are based on bijective binary transformations. A channel polarization kernel of 
dimension £, denoted by <?(•), is a bijective mapping 

<? : {0, 1}' {0, l}^ 

This means that g{u) = x, u,x G {0, 1}^. Denote the output components of the transformation by 

gi{u) = x, ie[£], 

where for a natural number £, we denote [£] = {1, 2, 3, ^}. For i > j, let — {uj,...,Ui) be the sub- 
vector of u of length i — j + 1 {Hi < j we say that ~ () , the empty vector, and its length is 0) . It is con- 
venient to denote by g^^i') : {0, 1^^' {0, 1}^ the restriction of g{-) to the set |v*iU^"'|u^"' e {0, 1}^"'|, 
th&it IS 

gW)(un=5Kun ^e[£^^]. 

Next, we consider code decompositions. The initial code is partitioned to several sub-codes having the 
same size. Each of these sub-codes can be further partitioned. Here we choose as the initial code, the 
total space of length £ binary vectors, and denote it by — {0, 1}^. This set is partitioned to mi equally 
sized sub-codes T2^\ T2^\ T2"^^ ^\ and each sub-code Tj'''^'* is in turn partitioned to TO2 equally sized 
codes yj(''i''') xjl^'i'i)^ Tg^^'™^"^-* (6i e {0, 1, ...,toi — 1}). This partitioning may be further carried on. 

Definition 1 The set {ri,...,T,„} is called a decomposition o/ {0, 1}^ , if tP = {0,1}^, and T^'^^ ^ is 
partitioned into rui equally sized sets ^'^} . . ' ^''■^^ ^ ^ [m — 1]). We denote 

the set of sub-codes of level number i by 

r, = {7;^''*i"^|6, G{0,l,2,...,m,-l},jG [z-1]}. 
The partition is usually described by the following chain of codes parameters 

{ni,ki,di) - (n2,fc2,(i2) - ■■■ - {rim, km, dm), 
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if for each T ^ Ti we have that T is a code of length Ui, size 2^^' and minimum distance at least di. 

If the sub-codes of the decompositions are cosets, then we say that {Ti, T!„i} is a decomposition into 
cosets. In this case, for each Ti the sub-code that contains the zero codeword is called the representative 
sub-code, and a minimal weight codeword for each coset is called the coset leader. If all the sub-codes in 
the decomposition are cosets of linear codes, we say that the decomposition is linear. 

Example 1 As an example consider £ — 4 and the 4x4 binary matrix 



G 



/ 1 \ 
110 
10 10 

Villi/ 



A partition into cosets, having the following chain of parameters (4, 4, 1) — (4, 3, 2) — (4, 1, 4), can be implied 
by the matrix. This is done by taking — {0, l}**, which is partitioned to the even weight codewords and 
odd weight codewords cosets, i.e. T^^ ~ = 0(mod2)|, T^^ — |uf|^^^-^Ui = l(mod2)|, 

these cosets are in turn partitioned to anti podalic pairs, T^°^°'^ = {0000,1111}, ^3^°'^^ = {1010,0101}, 

= {1100,0011}, rf'^^ = {0110,1001}, andxl^^'''^ = [1000] + Tg^"'''^ (b e {0,1,2,3};. Note, that 
in order to describe this partition, it suffices to describe the representatives and the coset leaders for the 
partition of the representatives. 

A binary transformation can be associated to a code decomposition in the following way. 

Definition 2 Let {Ti, T2, T^+i} be a code decomposition of {0, 1}^, such that = 2 for each i E [£]. 
Note that the code ^ is of size 2^^'+^, and specifically j'i^^^^'---'^^) contains only one codeword. We 
call such a decomposition a binary decomposition. The transformation g {■) : {0,1}^ — t- {0,1}^ induced by 
this binary code decomposition is defined as follows. 

giui)^4 */x^ieri;p. (3) 

Following the definition, we can observe, that a sequential decision making on the bits of the input to 
the transformation (uj) given a noisy observation of the output is actually a decision on the sub-code to 
which the transmitted vector belongs to. As such, deciding on the first bit ui is actually deciding if the 
transmitted vector belongs to Tj''^ or to T2^\ Once we decided on ui, we assume that we transmitted a 
codeword of Tj"^^ and by deciding on U2 we choose the appropriate refinement or sub-code of Tj"^"*, i.e. 
we should decide between the candidates T^"'''^^ and Xjj"^^^). Due to this fact, it comes as no surprise 
that the Hamming distances between two candidate sub-codes plays an important role when considering 
the rate of polarization. 

Definition 3 For a binary code decomposition as in Definition the Hamming distances between sub- 
codes in the decomposition are defined as follows: 



ci e ,C2 e 



i^«„ =min{i?(l(uri)|uri e {0,1}^-'} 



A transformation g (■) can be used as a building block for a recursive construction of a transformation 
of greater length, in a similar manner to [l . We specify this construction explicitly in the next definition. 
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Definition 4 Given a transformation g{ ) of dimension £, we construct a mapping g^™'\-) of dimension 
f"' (i.e. g'-"'\-) : {0, 1}^ ^ {0, 1}^ ) in the following recursive fashion. 

g^'\ui) = g{ui) ; 



(71,1,72,1,73,1, . . . ,7^™-l,i) 
(71,2, 72,2, 73,2, •■•,7£"-i,2) 



5j 



The transformation t/^™' (•) can be used to transmit data over tlie B-MC cliannel. The method of successive 
cancelation can now be used to decode, with decoding complexity of O (2^ • ■ logg{N)) as in [T]. 

We use the same channel definition, the corresponding symmetric capacity and the Bhattacharyya 
parameter as in [U [3l |4]. Note that for uniform binary random vectors C/f, and Xf = g {U() we have 
that /(F/; U() = I{Yi;Xf), because the transformation g{-) is invertible. Furthermore, since we consider 
memoryless channels, we have /(F/; Xf) — I ■ I{Y\\X\) — I ■ /(W^), and on the other hand 

i=l i=l 

Define the tree process of the channels generated by the kernels, in the same way as it was done in [T] 
and generalized in [3] . A random sequence 

{Wn}„yo is defined such that Wn S {W<-'^}^^^ with 
Wo = W 

where {Bn}^^-^ is a sequence of i.i.d random variables uniformly distributed over the set {0, 1, 2, ^ — 1}. 
In a similar manner, the symmetric capacity corresponding to the channels {-fn}„>o = {H^n)}n>o ^^'^ 
the Bhattacharyya parameters random variables {Zn}n>n = {■^(^n)}ri>o defined. Just as in [U 
Proposition 8], we can prove that the random sequence {In}n>o ^ bounded martingale, and it is 
uniform integrable which means it converges almost surely to loo and that E{/oo} = I{W)- Now, if we 
can show that Zn — >■ Z^o w.h.p such that Z^o G {0, 1}, by the relations between the channel's information 
and the Bhattacharyya parameter [2 Proposition 1], we have that loo G {0,1}- But, this means that 
Pr(^oo = 1) = E{/oo} = I{W), which is the channel polarization phenomenon. 

Proposition 1 Let g{-) be a binary transformation of dimension £ , induced by a binary code decomposition 
{Ti,T2, ...,Te+i}. If there exists u{-^ £ {0, 1}^"^ such that d';^\Ju{-^) > 2, then Pr {loo = 1) = I{W). 

Proof In [H Corollary 11], sufficient conditions are given for 

lim Pr {Zn e{5,l- 5)) = V(5 e (0, 0.5). (4) 

n— >oo 

The first condition is that there exists a vector u^~^, indices i, j G [(] and permutations cr(-), and t(-) on 
{0, 1} such that 

5i"' {un) ^ cr{ut) and g^"' {ue) ^ n{ue). 

This requirement applies here, because if there exists u^~^ £ {0,1}^^^ such that D^mini^i^^) — 2' then 
the two codewords of the code , Ci and C2, are at Hamming distance at least 2. This means that 
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there exist at least two indices i, j such that ci^i ^ C2,i and cij ^ C2.j, therefore 5^ (m^ ) and 5^ (ug) 

are both permutations. The second condition is that for any v^^^ G {0, there exist an index m G [£] 
and a permutation /i(-) on {0, 1} such that 

This requirement also applies here, by noting that for each v^^^ G {0, 1}^ ^ the two codewords of the set 

are at Hamming distance at least 1. This means that ([4]) holds, which implies that I^o £ {0,1} 
almost surely, and therefore Pr (/qo = 1) = I{W). 

The next proposition on the rate of polarization is an easy consequence of [4] Theorem 19] and 
Proposition [TJ 

Proposition 2 Let g{-) be a bijective transformation of dimension £, induced by code partitioning {Ti, T2, T^+i}. 

// there exists u^"^ G {0, 1}^"^ such that f^LC^i^^) > 2, then 

(i) For any /3 < E{g) 

lim Pr fz„ < 2"^"^) = I{W), 

(ii) For any /3 > E{g) 

lim Pr (Zn > 2-'^"") = 1, 

where E{g)^^ Zl.log, (i?«„) . 

Naturally, we would like to find kernels maximizing E{g). In the next section we consider upper 
bounds on the maximum achievable exponent per dimension H. 



3 Bounds on the Optimal Exponent 

We define the optimal exponent per dimension £ as 

El = max E{q). (5) 

Note that in [2, was defined as a maximization over the set of binary linear kernels, and here we extend 
the definition for general kernels. Furthermore, a lower bound on the kernel using Gilbert- Vershamov 
technique also applies in this case [21 Lemma 20]. The following lemma is a generalization of [31 Lemma 
18]. 

Lemma 1 (Generalization of [3], Lemma 20) Let g : {0,1}'' — ?> {0,1}^ be a polarizing kernel. Fix 
A; G [£ — 1] and define a mapping 

9 (vi) = g (v^~i , Vk+i , Vk , v^+2) , (6) 

i.e in this mapping the coordinates k and k+1 are swapped. Let |£'TOj„| and |£'|^i„j- denote the 
partial distances of g{-) and g{-) respectively. If E>''^^^> d'^^'^^^ then 
(i) E{g) < E{g) 
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Proof We follow the path of the proof of [3l Lemma 20]. It will be useful to introduce the following 
equivalent definition of the partial distance sequence. 

= mm{dH {g (wl-\0,uf+i) ,9{w\-\ l,vf+0) \^^\-\nl„vl,} (7) 

According to this definition it is easy to see that 

Z?«„ = ^f^L ^e[i]\{k,k+l}. (8) 

Hence, it suffices to show that 



„(fc) ^ jjik+i) ^ ^(fc) ^ jj{k+i) 

min mill — min min V / 



in order to prove (i). 
Using d?]), we have 



DJn = mm 



A(fe+1) 

^min = mm 



in {dn {g (wr \ 0, , g (w^\ 1, v^+O) |w^\ u^,+i, v^+i } (10) 

in (5 (w^-\ Ufe, 0, U^+2) = g (w^"\ Wfe, 1, V^+2)) Wfe, Wfej (11) 

^L'i^;'^ = min {dn (5 (w^, 0, ni^,) , g (w^, 1, v^+2)) |w^, ui+2, v^fc+2}- (12) 

(rfff (.9(w^"\0,Wfc+l,U^+2) ,.9(wi"\l,Wfe+l,V^+2)) W^"\wfc+i,u(.+2,V^.+2}- (13) 



We see that A, = and A2 > D^^l So, D^^^^ = D^^/^'\ because i^^^ > ^mit'^- So this proves 



Because the set on which we perform the minimization in (jl3p is a subset of the set on which we preform 
the minimization in ([TU]) we have that D^^)^ < d'^^^K On the other hand, the minimization in ([TT|) can 
be expressed as D^^^^ = min |Ai, A2|, where 

Ai = min|dH (.g(w5=-\wfe,0,u^fe+2) , .9 Wfe, 1, v^+2)) '^t'^,-^i+2.A+2,Wk} (14) 

A2 = min {rfff {g {w^-\wk,0, u^+2) , .9 (w^^S 1 - Wfc, 1, v^+2)) w^^S u^+2, Wfe}- (15) 

that Ai = dI^+''> ; 
^ and therefore (i). Now, 

f)(fc) ^ r)(fe+i) ^ r)(fe) < nC^+i) 

mill min min — min ' 

which results in (ii). ^ 
Lemma [1] implies that when seeking the optimal exponent, i^^, for a given dimension £, it suffices 
to consider kernels with non-decreasing partial distance sequences. This observation also results in [31 
Lemma 22] 

Lemma 2 ([3], Lemma 22) Let d(n,k) denote the largest possible minimum distance of a binary code 
of length n and size 2*^. Then, 

1 ^ 

E,<-Y^\og,{d{i,i-i^\)) (16) 

Proof Consider a polarizing kernel g[^-) having partial distance sequence 'f-^fninl'. ■ Because of Lemma 
[U we can assume that the sequence is non decreasing (otherwise, we can find a kernel that is having a 
non-decreasing sequence with at least the same exponent). Note that 

^L'il = minDf^|„-minf min{dH(ci,C2) Ci,C2 eri"' '\ Ci ^ C2} } < d(^, ^ - + 1), (17) 
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where the second inequality is due to the fact that each of the codes in the inner minimum, (i.e. 1). ) , 
is of size 2^"'^'+^ and length £. <^ 

As already noted in [3], the shortcoming of (fT6|) as an upper-bound, is that the dependencies between 
the partial distances are not exploited. For binary and linear kernels, 0i Lemma 26] gives an improved 
upper bound utilizing these dependencies. In the sequel we develop an upper bound that is applicable to 
general kernels. The basic idea of the bound we develop, is to express the partial distance sequence of a 
kernel, in terms of distance distributions of a code. 

For a code C of length £ and size M we define the distance distribution as 

B^ = ^\{ici,C2)\dH{cuC2)=i}\ 0<i<£. (18) 



Note that Bq — 1 and 



^B^^M^l. (19) 



Now, given a non decreasing partial distance sequence j i^f,*!,; | we choose an arbitrary k G [£] and 
consider the sub-sequence |-D^|n| ■ Using the reasoning that led to pT)) . we observe that we need to 

L J i—k 

consider the sub-codes s ^ > of size AI = 2^^^^+^, but whereas in (|16p we considered only 



'e{o,i}''- 

the minimum distance, here we may have additional requirements from the distance distribution of the 
code. Let's begin by understanding the meaning of -D^j^ i^^^ ^^^^ element of the sequence). By definition, 

the code xj^ ^ is decomposed into ^ ^ ^ sub-codes of size 2, such that in each one the distance between 
the 2 codewords is at least D^^^. This means that we must fulfill the following requirement 



^'^1' (20) 



where {Boli^o ^^^"^ distance distribution of ^ . Now, let's proceed to D^.^^^ . This item implies 

that there are ^ ^.^'^ sub-codes of T^^"^ ^ of 4 codewords that each one of them can be decomposed into 2 

(£—1) 

sub-codes of 2 code-words having minimum distance between the sub-codes of at least D^^^ ' . From this, 
we deduce that there are 2 ■ 2^"'^+^ pairs of codewords having their distance at least d'^'^K These pairs 
are an addition to the the ones we counted in (1^ . Thus, because we assume that the partial distance 
sequence is non-decreasing, we have the following requirement. 

E ^'>3- (21) 
Note that if d[^'[^'' = D^^^^ then (1^ is redundant given (PT|) . In the general case, when considering 

(£ — r) rji — k+l (u?^~^) , -, 

-^min ' where < r < £ — k, we take into account 2'-+^ sub-codes of , each one of size 2''+^ and 

(£—r) 

each one can be partitioned into two sub-codes of which the minimum distance between them is -D„jj,j . 
So, there are 2 • 2r-+i ' \^^) — M - 2^ codewords pairs (that were not counted at the previous steps) such 
that their distance is at least . Summarizing, we get the following set of £ — k + 1 inequalities 

e r 

J2 ^' ^ H 2^ = 2''^^ -1 0<r<l-k. (22) 
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By Delsarte f6], we can specify additional linear requirements on the distance distribution, by 



J2B,-P,ij)>-(^^ 0<z<i, (23) 
where Pk (x) is the Krawtchouk polynomial, which is defined as 

P,(:.)=^(-irf^)ff~^). (24) 

^ — ' \m/ \ k — ml 

m=o \ / \ / 

In addition, the following is also an obvious requirement 

B,>0 ie[e]. (25) 

We see that requirements p^ . (|22l) .(P ^ and are all linear. A partial distance sequence that corre- 
sponds to a kernel must be able to fulfill these requirements for every k £ [i]. So, taking the maximum 
exponent corresponding to a partial distance sequence that fulfils the requirement for each k € [£] results 
in an upper-bound on the exponent. Checking the validity of a sequence can be done by linear program- 
ming methods (we need to check if the polytope is not empty). We now turn to give two simple examples 
of the method, and after them we present a variation on this development that leads to a stronger bound. 



Example 2 Consider £ — 3. Let ^D^^^^-^^ be the partial distance sequence. Note first that by the 

(k) ^ ^ 

ngelton bound D)J^^ < k. We fir 
is translated by il9\) and i2^) to 



singelton bound -D|^^„ < k. We first consider the possibility that -D^'„ = 3 and D^'^l^ = 2. This assumption 



By I123\) for i ~ 1 we have 



B2 + B3^3 ,B3>1 , B2,B3>0 (26) 

S2-Pi(2) + B3-Pi(3)>-3 

- P2 - 3 • 5,3 > -3 ^£161, P2 = 0, P3 - 3 (27) 
on the other hand for !i23\} i = 3 we have 



P2-P3 > -1 

f2) f3) 

which is a contradiction to {2'T\j . The next best candidate is a sequence having D^^l^ — D^^l^ — 2, this 
sequence is feasible by considering the following generating matrix 




This proves that E3 = ^ logg 4 w 0.42062. 

Example 3 Consider £ — A. Let |-Dm'„| be the partial distance sequence. We first consider the 

possibility that D^^l^^ = -D^^]„ = 3 (if this possibility is eliminated it means that D^^l^ — 4,1)^],^ — 3 is 
also not possible). J_?ff|) and i22^) are translated to 

B3+Bi^3 B3,Bi>0 (28) 
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By ^2k^] for i = 1 we have 

S3-Pi(3) + S4-Fi(4)>-4 
-2 • S3 - 4 • B4 > -4 =^{28i B3 + 2(3 - B3) < 2 ^ S3 > 4 
which is a contradiction to H28\). The next best candidate is 



d'-^^ = 4 D'-"' = 2 W-^' = 2 n^"' = 1 

min ' min ' mm ' min ' 

which can be achieved by a binary linear kernel induced by the generating matrix 

( 1 ^ ®' 
1 1 1 

This proves that i?4 — 0.5. 

The idea of transforming the partial distance sequence into requirements on distance distributions can 
be further refined. As we did before, we begin our discussion by considering the sub-sequence |-D,^,*ji,| 
We start by giving an interpretation to (the last element of the sequence). By definition, the code 

sub-codes of size 2, where in each one the distance between the 2 
i G [P\ the partial distance distribution of the sub-code 



l(3) 



1(2) 



^ ' is decomposed into ^ 
codewords is at least D'"^-. Denote by B, 



T^ ' ' of the code T^ 



By definition we have 



B 



rfH(Cl,C2) = i 



ci,c2 e T, 



Obviously, 



5: B^'^") = i vure{o,lr-^ 



C) 



< i <£,v4"i e {0,1}^ 



Define the average of this distribution over all the sub-codes of , i.e. 



B 



E 



(29) 



(30) 



(31) 



(32) 



Note that 



and 



B. 



M 



dH {C1,C2 ) 



c4,c2eT("^^"^\u^ie{0,ir-^ 



(^) 



E^f -^^w^- 



o<i<e. 



(33) 
(34) 
(35) 
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Let's proceed to -0^;^^^^ By definition, the code T^"^ ' is decomposed into ^"^ sub-codes of size 4, 
where in each one the distance between the 2 codewords is at least D\^^ ' . Denote hy B\ ^ ' i G [£], 
the distance distribution of the sub-code ^ ' oi the code ' . 



B} 



4 



rf//(Cl,C2) = i 



ci,c2Gri_"p 



Note that 



So by introducing the average distance distribution 



E 



<-^6{0,ir-'=-^ 



(36) 
(37) 

(38) 



we get 



(£-1) 



(39) 



f"i(i)>- . 0<i<£ 



and 



^(^-1) _ > < i < ^. 



(40) 



(41) 



In the general case, when taking D^^^^ ' into account, where < r < ^ — fc, we essentially consider the 
^^r+i sub-codes of Tj,^ , each one of size 2'"+^ and each one can be partitioned into two sub-codes of 
size 2'' of which the minimum distance between them is D^lzl'' ■ Denote the distance distribution of the 



sub-code T, 
have 



which results in 



and the average distance distribution as [ b\^' '^^ \ . We 



ie[' 



B. 



(urf+^') _ 1 



2r+l 



^^(01,02) = i 



ci,c2 e 



B, 



2e-k- 



E 



B 



(u^-f+i)) 



/-(r-1) 



6{0,1}'= 



,ie[e], 



(42) 
(43) 

(44) 



E^r'^'-^^o-)> 



< i < 



(45) 
(46) 



We summarize this development. 
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# 




optimal sequence 




1 


12 


1,2,2,2,2,4,4,4,6,6,6,12 




0.49605 


2 


13 


1,2,2,2,2,4,4,4,6,6,6,8,10 




0.500498 


3 


14 


1,2,2,2,2,4,4,4,6,6,6,8,8,8 




0.50194 


4 


15 


1,2,2,2,2,4,4,4,6,6,6,8,8,8, 


8 


0.507733 


5 


16 


1,2,2,2,2,4,4,4,6,6,6,8,8,8, 


8,16 


0.52742 



Table 1: per different dimensions 



Definition 5 Lei {^^iliG^ monotone non-increasing sequence o^ non-negative integral numbers, such 
that Di < d{i, £ ~ i + 1). We say that this sequence is £ dimension Linear Programming (LP) valid if the 



polytope defined by the following non negative variables l^f^jl <k<£ 

i r 

^ ^ 2^ ^ 2''+i - 1 0<r<£~l 



Dk ^ * £ ^1 is not empty. 



(47) 



B 



(i-r) 



B 



-r+l) 



> 



3 = Dl^r 



l<r < 

£ 



1, Di^r+l <i<£ 

0<i<£, 0<r <£-l 



(48) 
(49) 



Proposition 3 // |-D[^^j„| is a partial distance sequence corresponding to some binary £ dimension 
kernel g{-), then {o'^lli \ is l-dimension LP-valid sequence. 

We denote by Vlp the set of ^-dimension LP-valid sequences. The following proposition is an easy 
consequence of Proposition O 

Proposition 4 



Ei < max 



1 ^ 



(50) 



It should be noted that the method of Proposition |4] can be easily generalized to non-binary kernels using 
the appropriate (non-binary) Krawtchouk polynomials. We computed the bound for several instances of £ 
by carefully enumerating the sequences in Vlp using Wolfram's Mathematica LP-Solver. Table [1] contains 
the results for 12 < ^ < 16. In the next section, we give examples of good kernels, that are derived by 
utilizing results about known code decompositions, for 14 < £ < 16 that achieve the optimal exponent. 



4 Designing Kernels by Known Code Decompositions 

As we noticed in Section[2l the exponent, E(g), is influenced by Hamming distances between the subsets in 
the binary partition {Ti, T^+i}. In this section, we use a particular method for getting good distances by 
using known decompositions, which are not necessarily binary decompositions. The following observation 
links between general decompositions and binary decompositions. 

Observation 1 If there exists a code decomposition o/{0,l}^ with the following chain of parameters 

{£, fci, di) - (£, A;2, ^2) - •■• - {£, hn,dm), 
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# 




chain description 


lower 
bound on 

E{g) 


1 


16 


(16, 1) - (15, 2) - 


(11,4) 


-(8,6) 


-(5,8) 


-(1,16) 


0.52742 


2 


16 


(16, 1) - (15, 2) - 


(11,4) 


-(7,6) 


-(5,8) 


-(1,16) 


0.51828 


3 


15 


(15,1) - (14,2) - 


(10,4) 


-(7,6) 


-(4,8) 




0.50773 


4 


14 


(14, 1)-(13,2)- 


(9,4)- 


-(6,6)- 


-(3,8) 




0.50193 



Table 2: Code decompositions from [7j Table 5] with their corresponding lower bounds on kernel exponents 
for the kernels induced by them. 

then there exists a binary code decomposition of {0, 1}^, such that 

^l^in > dj where kj+i < £ - i + 1 < kj, 
j G [m], i € krn+i = 0. 
The next observation about the kernel exponent is an easy consequence of the previous observation. 
Observation 2 If there exists a code decomposition o/{0, 1}^ with the following chain of parameters 

{i, fci,di) - (^,^2,^2) - ■•• - {£,km,d„i), 

then there exists an I dimensional binary kernel g{-) induced by a binary code decomposition {Ti, ...jTjj+i} 
such that 

m 

E{g) > (l/e) ■ - h+i) ■ log, (d.) , (51) 

where fcm+i ~ 0. 

In [71 Table 5], the author gives a list of code decompositions for £ < 16. Using this list, we can 
construct polarizing non-linear kernels and get lower bounds on their exponent E{g) (In order to do so, 
we use Observation [5] and Propositions [T] and [5]) . Table [5] contains a list of code decompositions that give 
lower bounds on E{g) that are greater than 0.5. At the chain description column of the table, the code 
length equals £ for all the sub-codes, and was omitted from the chain for brevity. Note that the second 
entry of the table has the exponent of the kernel suggested in [3]. It was proven that this is the best 
linear binary kernel of dimension 16, and that all the linear kernels of dimension < 16 have exponents 
< 0.5. The first entry of the table gives a non-linear decomposition resulting in a non linear kernel having 
a better exponent. In fact, this exponent is even better than all the exponents that were recorded in [31 
Table 1]. Furthermore, entries 1,3 and 4 achieve the optimal exponent per their dimension as Table [T] 
indicates. Thus, the exponent value indicated in Table [2] is not just a lower bound, but rather the true 
exponent. The appendix contains details about the decompositions in Table [21 

5 Conclusions 

The notion of code decomposition was used for the design of good binary kernels in the sense of the polar 
code exponent. Some of the kernels we suggested are proven to achieve the optimal exponent per their 
dimension. It should be noted that by using non-binary kernels one can get better exponents, as was 
demonstrated in [5 . There is an essential loss, when using non-binary code decomposition for designing 
binary kernels. It seems that if we allow the inputs of the kernel to be from different alphabet sizes, we 
may gain an additional improvement. This interesting idea is further explored in a sequel paper by the 
authors [8]. 



12 



Appendix 



In this appendix we give details on the decompositions enumerated in Table [2] All of the decompositions 
are coset decompositions, so we only need to specify the sub-code representatives. 

#1)(16,16,1)- (16,15,2) - (16,11,4) - (16,8,6) - (16,5,8) - (16,1,16) 

The sub-code representatives are (16, 15, 2) single parity check code, (16, 11, 4) extended Hamming code, 
(16, 8, 6) Nordstrom-Robinson code, (16, 5, 8) first order Reed-Muller code, (16, 1, 16) repetition code. 

#2)(16, 16, 1) - (16, 15, 2) - (16, 11, 4) - (16, 7, 6) - (16, 5, 8) - (16, 1, 16) 

The sub-code representatives are (16,15,2) - single parity check code, (16,11,4) - extended Hamming 
code, (16, 7, 6) - extended 2-error correcting BCH code, (16, 5, 8)- first-order Reed-Muller code, (16, 1, 16) 
- repetition code. 

#3)(15, 15, 1) - (15, 14,2) - (15, 10,4) - (15, 7,6) - (15,4,8) 

The sub-code representatives are (15,14,2) - single parity check code, (15,10,4) - shortened extended 
Hamming code, (15,7,6) - shortened Nordstrom-Robinson code, (15,4,8) - shortened first order Reed- 
Muller code. 

#4)(14, 14, 1) - (14, 13, 2) - (14, 9, 4) - (14, 6, 6) - (14, 3, 8) 

The sub-code representatives are (14, 13, 2) - single parity check code, (14, 9, 4) - twice shortened extended 
Hamming code, (14,6,6) - twice shortened Nordstrom-Robinson code, (14,3,8) - twice shortened first 
order Reed-Muller code. 

Explicit Encoding of Decomposition #1 

For decomposition #1 we elaborate on the kernel mapping function g{-) : {0, 1}^^ {0, 1}^^. To do so, 
we use Table [31 The third column from the left determines whether the vectors on the second column 
are all the coset vectors (if they do not form a linear space) or just the basis for the space of coset 
vectors (if they form a linear space). The fourth and the fifth columns determine the stage of the code 
decomposition these vectors belong to; the "main code" is decomposed to cosets of the "sub-code" (each 
coset is generated by adding a different coset vector from the set specified by column 2 to the sub-code). 
The entry corresponding to indices 9 — 11 is taken from [5]. 

We now describe the encoding process. Let u}^ be a binary vector. The indices of the vector are 
partitioned to subsets according to the first column of the table. For each subset the corresponding sub- 
vector of u is mapped to a coset vector. The mapping can be arbitrary, but when the coset vectors form 
a linear space, we usually prefer to multiply the corresponding sub-vector by a generating matrix which 
rows are the vectors in the "coset vectors" column. To get the value of ^(u), we add-up the six coset 
vectors we got from the last step. Note that using this mapping definition, it is easy to derive the mapping 
function corresponding to decompositions #3 and #4 as well. 
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vector 
indices 


coset vcctoi's 


cosct vectors 

form a 
linear space? 


m a i n code 


sub-code 


1 


[0000000000000001] 


yes 


(16, 16, 1) 


(16, 15, 2) 


2-5 


[0000000100000001] 
[0000000000010001] 
[0000000000000101] 
[0000000000000011] 


yes 


(16, 15,2) 


(16,11,4) 


D — o 


Innm nnm nnm nnni I 

[UUU 1 UUU 1 UUU 1 UUU 1 J 

[0000010100000101] 
[0000000001010101] 


yes 


Mr 11 A^ 

^ ID, 1 1 , 4:J 


^ ID, o, O) 


9-11 


[0000000000000000] 

rnnni nnm ni nni ni i 

[0001001000101110] 

[0001011100011000] 

[0000011000110101] 

[0001010001110010] 

[0000010101101100] 


no 


(16,8,6) 


(16,5,8) 


13 - 15 


[0101010101010101] 
[0011001100110011] 
[0000111100001111] 
[0000000011111111] 


yes 


(16, 5, 8) 


(16, 1, 16) 


16 


[1111111111111111] 


yes 


(16, 1, 16) 





Table 3: Coset vectors for code decomposition ^1. 
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